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but an accumulation of plain, unadorned facts available to any- 
one's inspection, it seems useless to try to bolster either of them 
up by the dialectic methods of a lawyer's appeal to the jury. 

Raymond Pearl. 
June 21, 1911. 

ON THE FORMATION OF CORRELATION AND CON- 
TINGENCY TABLES WHEN THE NUMBER OF 
COMBINATIONS IS LARGE 

In earlier numbers of this Journal two papers on that use- 
ful tool, the correlation coefficient, have appeared. The first 1 ex- 
plains and illustrates a convenient method of carrying out the 
arithmetical routine of calculation, while the second, by Pro- 
fessor Jennings, 2 describes a method for obtaining the coefficient 
for symmetrical tables without the labor of actually rendering 
the tables themselves symmetrical. 

The purpose of this note is to point out a method of prepar- 
ing correlation tables where the number of combinations is 
large. Such tables are not infrequently needed. Suppose, for 

1 Harris, J. Arthur, ' ' The Arithmetic of the Product Moment Method 
of Calculating- the Coefficient of Correlation, ' ' Ames. Nat., Vol. 44, pp. 
693-699, 1910. 

In a note on this method of calculating the coefficient of correlation, 
Professor Jennings (Ambe. Nat., Vol. 45, p. 413, 1911) suggests reduction 
in size of the moments by designating the lowest grade by and the suc- 
ceeding- ones by 1, 2, 3, ... n. In this he is quite justified. I have fre- 
quently used the scheme he suggests during the last several years, but I 
did not refer to it particularly in my note, and for two reasons. First, I 
thought the point sufficiently covered by the statement that the rough 
moments may be taken about any arbitrary point as origin, and by the 
suggestion that when the range is very great it may pay to use the con- 
ventional methods in calculating- the standard deviations. Second, according 
to my experience it is better, whenever possible, to keep the actual values. 
When one uses a mechanical calculator the arithmetical routine is (after a 
little practise) not out of proportion to the advantages. Under many cir- 
cumstances these are very great: (a) all the values have a direct biological 
(physical) significance, (b) the means of arrays may at once be obtained 
for testing linearity of regression, (c) tables for different lots of material 
may be combined or separated at will by merely summing or segregating 
their moments, and finally (d) I shall siiow in a forthcoming paper how 
these moments, once calculated, may be of much service in obtaining some 
of the more difficult correlations. 

2 Jennings, II. S., ' ' Computing Correlation in Cases where Symmetrical 
Tables are Commonly Used," Amer. Nat., Vol. 45, pp. 123-128, 1910. 
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instance, that one wishes to correlate between the hatching qual- 
ity of the eggs of sisters in the domestic fowl, as Pearl and Sur- 
face 3 have actually done. If each family be composed of only 
ten pullets and there be only fifty families the number of entries 
in the symmetrical correlation table will be 10 X 9 X 50 = 4,500. 
Or again, if one be interested in determining whether the di- 
mensions or proportions of the blood corpuscles differ from in- 
dividual to individual in an animal, say the tadpole, 4 and have 
measurements of twenty-five corpuscles in each of 100 individ- 
uals, he may have to form a correlation table of 60,000 entries. 
Much larger tables than this have been formed. The labor is of 
course excessive, and this has been one of the factors limiting 
their application to problems of morphology, physiology and 
heredity. 

In many cases the routine, as I have found from considerable 
experience, can be profitably carried out as follows. 

The individuals of each class are seriated separately and the 
frequencies entered in horizontal rows in a table of vertical col- 
umns, each devoted to one of the grades of variates, g m+1 , <7m +2 , 
. . . </,»+«. A second table, exactly like the first in width of col- 
umn and row is prepared and cut into strips by columns. Each 
of these columns is moved successively across the surface of the 
original table, and the frequencies which are in juxtaposition 
are multiplied together and their products summed and entered 
on a correlation blank, in the comi)artment corresponding to the 
captions of the two columns. This is repeated for all the col- 
umns except the one identical with the strip. If the strip be for 
grade g m +o, the multiplications and summations from once pass- 
ing it over the original table give the whole relative array 
associated with it as subject, except the frecpiencies of the diag- 
onal cell, (7«, +2 ' — <7m + 2- B To obtain these each frequency on a 
strip is multiplied by itself less one and the products summed. 

It is not absolutely necessary, since the table is symmetrical, 

3 Pearl, E., and Surface, F. M., ' ' Data on Certain Factors Influencing 
the Fertility and Hatching of Eggs," Bull. Me. Ag. Bxp. Sta., No. 168, 
pp. 147-151, 1909. 

4 For actual cases, see K. Pearson, ' ' A Biometric Study of the Bed 
Blood Corpuscles of the Common Tadpole (Bona temvoraria) from the 
Measurements of Ernest Warren," Biometrika, Vol. 6, pp. 402-419, 1909. 

5 Diagonal cell is the term applied to a compartment of a row extending 
diagonally across the correlation surface. In symmetrical tables they con- 
tain the frequencies for identical values of the subject and relative. 
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to obtain the products of the frequencies of all of the columns by 
all the strips, but by doing so a check is obtained for all entries 
except those of the diagonal cell. 

The great advantage of this method is that it replaces mental 
and pencil drudgery with rapid mechanical calculation. Clip- 
ping the movable column by the side of the one with which it 
is to be compared in the table, one can obtain the products and 
the sum of the products simultaneously on a Brunsviga, by 
merely multiplying the successive pairs of frequencies together 
and allowing the products to accumulate. Of course the fre- 
quencies for the diagonal cell can be quickly obtained, by sum- 
ming the n(n — 1) values for the individual column, in pre- 
cisely the same manner. 

Purely as an illustration of method the intra-individual or 
homotypic correlation for number of seeds developing per pod 
in a series of Broom plants (Gytisus scoparius) 7 collected at 
Woods Hole in the late summer of 1907 will now be determined. 



TABLE I 
Seeds peb Pod fob Twenty-three Individual Plants 
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6 A Comptometer "will also do. 

' The variability of these has already been compared with that of Pear- 
son 's English series. See ' ' Variation in the Number of Seeds per Pod in 
the Broom, Cytisus scoparius," Amek. Nat., Vol. 43, pp. 350-355, 1909. 
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The original data appear seriated by individual plants in 
Table I. From this we derive the symmetrical intra-individual 
correlation surface, Table II. 

"Working by the conventional product moment method, but 
taking all moments around as suggested elsewhere in these 
pages, s we get : 

20') = 2,179,781, Vl '= 10.0524 = A, 

5(a;'2) = 24,578,235, ,-,' = 113.346284, 

SOY) =22,438,814, ^ = 12.295679 = a 2 , 

N = 216,842. 3 {x'y')/N = 103.480018. 

The y moments are of course the same as the x, and we have : 

S{xy) S(x'y')/N — v,'v,/ 



Na x (j y a 

S{x'y')/N — A* 



= .198. 



ix 2 

Or we may use a short formula for the difference method. 

r = l — i ~\ = 1 — .8024 = .198. 10 

Where a„ is the standard deviation of the difference between 
pairs. 

Where the number of individuals in an array is very small 
the method presents no very marked advantages, but when the 
arrays are large it may be very useful and its range of applica- 
bility very wide. 

For instance, one of the tests of the genotype theory of in- 
heritance is to compare the correlation between parents and off- 
spring with that between the parents co-fraternity and the off- 
spring in a population of self fertilizing or vegetatively 

8 Ameb. Nat., Vol. 44, pp. 693-699, 1910. 

Harris, J. Arthur, ' ' A Short Method of Calculating the Coefficient of 
Correlation in the Case of Integral Variates, " Biometrika, Vol. 7, pp. 215- 
218, 1909. 

10 On the basis of V = 2239, r = .198 ± .014, and we may conclude that 
the individual plants are slightly but distinctly differentiated in their 
capacity for maturing seeds. In his English Series of Broom, Pearson 
(Phil. Trans. Hoy. Soc. Lond., A, Vol. 197, p. 335, 1901) found r = .42. 
The difference may easily be attributed to the sniallness of the Woods Hole 
series, some, or possibly all, of the individuals of which may have come 
from the same parent plant. The case is an illustration of arithmetical 
method only. 
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propagating individuals. 11 The correlation surfaces are very 
easily prepared. Two seriation tables, one for the arrays from 
which the individual parents were drawn and one for the off- 
spring arrays corresponding to each parental fraternity, are 
prepared. The first table is cut into strips by columns, passed 
strip by strip over the offspring seriation table, the frequencies 
which are in juxtaposition are multiplied together and summed 
sinraltaneously, and the resulting totals entered in the proper 
compartments 12 of a correlation table. This may be called au 
ascendant-descendant correlation surface. It includes both 
"parental" and "avuncular" relationships. The "avuncular" 
relationship is the one sought, and is quickly gotten by subtract- 
ing the surface for' the relationship between individual parents 
and their offspring (which will have been already prepared for 
other purposes) from the ascendant correlation surface just de- 
scribed. 

In a forthcoming paper I shall show how various correlations 
may sometimes be most easily determined from the first two 
moments for the individual classes or families without the labor 
of drawing up tables. J. Arthur Harris. 

Cold Spring Harbor, N. Y., 
July 7, 1911 

ACQUIRED CHARACTERS DEFINED 

It is believed that if the term "acquired characters" is care- 
fully defined, and the matter considered in view of that defini- 
tion, a new light will be cast upon a generally misunderstood 
subject. The things to be denned are the verb to acquire, which 
means to obtain by effort, and the noun character, which means 
something forming part of an individual. The point of view 
here involved may be illustrated by the following quotation : 

' ' Some are born great, 
Some achieve greatness, and 
Some have greatness thrust upon them. ' ' 

This shows three ways in which an individual obtains great- 
ness. The same three ways apply to the different characters 

"For an illustration see K. Pearson's analysis of Hand's data for 
Hydra grisea, "Darwinism, Biometry and Some Becent Biology," Bio- 
metrika, Vol. 7, pp. 368-385, 1910. 

12 The compartments corresponding to the captions of the two columns 
dealt with. 



